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Abstract 

A real time coding system with lookahead consists of a memoryless source, a memoryless channel, an encoder, which encodes 
the source symbols sequentially with knowledge of future source symbols upto a fixed finite lookahead, d, with or without feedback 
of the past channel output symbols and a decoder, which sequentially constructs the source symbols using the channel output. 
The objective is to minimize the expected per-symbol distortion. 

For a fixed finite lookahead d > 1 we invoke the theory of controlled markov chains to obtain an average cost optimality 
equation (ACOE), the solution of which, denoted by D(d), is the minimum expected per-symbol distortion. With increasing d, 
D(d) bridges the gap between causal encoding, d = 0, where symbol by symbol encoding-decoding is optimal and the infinite 
lookahead case, d — oo, where Shannon Theoretic arguments show that separation is optimal. 

We extend the analysis to a system with finite state decoders, with or without noise-free feedback. For a Bernoulli source and 
binary symmetric channel, under hamming loss, we compute the optimal distortion for various source and channel parameters, 
and thus obtain computable bounds on D(d). We also identify regions of source and channel parameters where symbol by symbol 
encoding-decoding is suboptimal. Finally, we demonstrate the wide applicability of our approach by applying it in additional coding 
scenarios, such as the case where the sequential decoder can take cost constrained actions affecting the quality or availability of 
side information about the source. 

Index Terms 

Actions, Average Cost Optimality Equation (ACOE), Beliefs, Bellman Equation, Constrained Markov Decision Process, 
Controlled Markov Chains, Expected Average Distortion, Finite State Decoders, Lagrangian, Lookahead, Optimal Cost, Policy, 
Side Information, Value Iteration, Vending Machine. 

I. Introduction 

A. Motivation and Related Work 

A memoryless source {U\, U2, ■ . ■} is to be communicated over a memoryless channel with the objective of minimizing 
expected average (per-symbol) distortion, with or without the availability of unit-delay noise-free feedback. The communication 
is in real time and hence the encoding and decoding is sequential, with a fixed finite lookahead of source symbols available 
at the encoder (cf. the setting in Fig. [TJ. The motivation stems from practical systems such as for video streaming, cache 
memory devices in computing systems, real time communication systems etc., where the encoder has a fixed buffer of future 
source symbols, and the quality of service demands that encoding and decoding should be in real time. The problem finds its 
applications in other sequential decision systems, where resource allocation should be done on the fly due to adverse effects 
of latency or delay, such as sensor networks, weather-monitoring systems, flow in societal networks such as transportation 
networks, recycling systems, etc. A natural criterion of performance is to minimize the expected average distortion. What is 
the best we can do here ? Note that such a framework with real time constraints is not covered by Shannon Theory. In classical 
Information Theory, encoding of long "typical" sequences in blocks as well as block decoding introduces large delays and 
thus such achievable schemes violate the very premise of bounded or no delay constraint. To answer the question, we invoke 
markov decision theory and cast our problem and other such variants as discrete time controlled markov chains with average 
cost criterion. 

The problem is well motivated by practical problems of delay constrained source-channel coding and has been of much 
interest in the literature. There have also been many different ways to model the notion of sequential encoding and decoding. 
In the source coding context, causal source codes were studied in [I], 0> 0, which demand the reconstruction to depend 
causally on the source symbols. But this is a much weaker constraint and causal source codes can operate on large delays as 
was pointed out in (T) itself. Causal source codes with side information were studied in 0). 

Note that we can transform our setting of limited encoder lookahead of d, to that of a zero lookahead of a markov source, 
Vi = Uj . This transformation puts the problem in the class of sequential encoding decoding problems with markov sources. 
When the communication horizon is fixed, the structure of optimal encoding and decoding policies with Markov sources 
have been studied in 0, O, Q, (8), J9), ifTUll . In ifTTl . authors propose a systematic methodology for such a non-classical 
information structure to search for an optimal strategy. 

The problem of real time coding and decoding in semi stochastic setting, i.e., for the individual sequences was studied in 
|[T2l and fl3l . while finite state digital systems were the subject of study in [14|. 

The connection between dynamic programming and information theory has been well exploited. The problem of computing 
the capacity of channels with feedback was formulated as a Markov Decision Process in fl5l . |[T6l . The long standing problem 
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of capacity of trapdoor channel (cf. ifTTl . |[T8l ) with feedback was evaluated using average cost optimality equations in (191 . 
Zero error capacity for certain channel coding problems was computed using dynamic programming in l20ll . 



B. Contributions and Organization of the Paper 

The approaches in 0, (6), (7), (8), 0, ifTol and IfTTl are inspired by control theory, which provides tools for finding optimal 
schemes and understanding their structure. In this work, we take these tools further to provide more explicit expressions and 
bounds for the optimum performance under a given lookahead constraint d. While optimum performance in the case d = 
is easily shown to be attained by "symbol-by-symbol" operations, and the case d — oo can be answered with the tools of 
Shannon theory, for any finite d > 1, the existing literature does not provide useful analytical values or bounds on the minimum 
expected average distortion, D(d). In addition to being amenable to a decision theoretic formulation of Markov sources, as 
in the surveyed literature above, the model we consider here is more basic and lends itself to simpler average cost optimality 
equation, which in some cases (cf. Section [V]) can be computed exactly. While in Q, (8), [10] emphasis is on expected total 
fixed horizon cost, we argue that expected average cost over infinite horizon is a more natural criterion of performance as in 
the sequential encoding and decoding problems, we typically do not know when to stop, and hence we would like to analyze 
the asymptotics of the horizon-independent problem. While the main focus in this work has been to characterize the minimum 
achievable distortion, the average cost optimality equations also characterize sufficient conditions on the optimality of stationary 
(encoding and decoding) policies. 

Note that in our communication problem in Fig. [T] the lookahead is available only at the encoder while the decoder constructs 
the estimates causally, instead of a seemingly more general setting where lookahead of l e is present at the encoder while decoder 
has lookahead Id- However performance of any policy/code with encoder and decoder lookahead parameters (l e ,l m ) can be 
attained arbitrarily closely by the optimal policy for our setting in Fig. [TJ with d = l e + l m as pointed out in Section II of 
iTOl . Authors in ETI consider the communication problem similar to our setting with l e = 0, Id = d for d > 0, per-symbol 
distortion D(d) and show that D(d) converges exponentially rapidly to D(oo) and provide bounds on the exponent. However 
the results are asymptotic in nature and hence different from this work, which is explicit exact or approximate characterization 
of values for D(d) for any fixed, possibly small d. 

Recently there has been work in the direction of "action in information theory" , i.e. canonical Shannon theoretic models 
with encoder and/or decoder taking cost constrained actions to affect the generation or availability of channel state information, 
side information, feedback etc., cf. action in point to point scenarios in El . ||23l . l24l ||25l . [26 1 and in multi-terminal systems 
in (27l, 11281 . We revisit the setting of source coding with a side information vending machine, as in ll22l (See Fig. [6]l for 
the case where the encoding is sequential with lookahead, decoder takes an action A v sequentially dependent on the encoded 
symbols to get side information about the source through a memoryless channel, Py\u,A v - The reconstruction of the source is 
based upon the current encoded symbol, the current side information symbol and memories storing the past encoded symbols 
and side information symbols. We show that the problem can be formulated as a constrained Markov Decision Process. 

The main contribution of this paper is the casting of a large class of limited delay source, channel and joint source-channel 
coding problems in the realm of sequential decision theory, obtain characterizations of the optimum performance via average 
cost optimality equations with finite or compact state spaces, and solve exactly or obtain bounds for the expected average 
distortion as a function of lookahead d. 

The paper is organized as follows. Section [II] describes the basic model of problems with lookahead (See Fig.[TJl, encoding 
is sequential using the lookahead and unit delay noise-free feedback, Xi(U l+d , Y 1 ^ 1 ), while the decoding depends on the 
current channel output and the past memory, Ui(Yi, Zi-\). The memory evolves as Zi(Zi-i,Yi). We seek to find the minimum 
expected average distortion as a function of lookahead, i.e., 

JV 



D{d) = inf limsupE 

{Xi(-)},{Ui{-)} N^ov 



(1) 



In Section III we present an overview of controlled markov processes with average cost, the unconstrained case in Section 



III-A 



and constrained control in Section III-B Section IV studies the case of complete memory, i.e., Zi =Y % . In Section 



IV-A 



we use the theory of Section III to construct an average cost optimality equation, the solution to which is the average optimal 
distortion. In Section |IV-B| we consider the question "to look or not to lookahead " and specify a sufficient condition under 
which symbol by symbol encoding-decoding is optimal for a given source, channel, distortion function and lookahead. This 
kind of result in our problem of sequential encoding decoding with lookahead complements that of "to code or not to code 
" of (29l . In Section [V] we consider the framework with finite state decoders, constructing corresponding ACOE in Section 
V-A In Section V-B| we use relative value iteration to solve the problem exactly for an example of binary source and binary 



symmetric channel under hamming loss, thereby demonstrating how the average distortion values for this setting can be used to 
bound D(d) of Section IV We also contrast with the extreme cases of no lookahead, d = 0, where symbol by symbol policies 
are optimal and d = oo where Shannon's Separation Theorem [30] determines the minimum expected average distortion. We 
also highlight the regions of source-channel parameters where for any finite d > 1, symbol by symbol encoding-decoding is 
strictly suboptimal for a Bernoulli source and binary symmetric channel. Section VI relaxes the assumption of the previous 
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sections that feedback is present. In Section |VII| the setting of source coding with a side information vending machine is 
considered. Here again, encoding is sequential with lookahead, decoder takes cost constrained actions, A v ^, sequentially to 
get side information about the source through a memoryless channel, Py\u,A v - The decoding is the optimal reconstruction 
Ui(Xi, Yi, Mj_i, JVj_i), where M^_i and iVj_i are the memories storing some or all of past encoded symbols and side 



information symbols, respectively. Section VII-A evaluates the case when encoder also has access to the side information, with 



decoder having complete memory in Section VII-A 1 while finite memory decoders are considered in Section VII- A2 Section 



VII-B studies the same source coding problem with a side information vending machine but now encoder has no access to 



side information. Section |VIII| summarizes the methodology developed in this paper of constructing average cost optimality 



equations. The paper is concluded in Section IX 



II. Problem Formulation 

We begin by explaining the notation to be used throughout this paper. Let upper case, lower case, and calligraphic 
letters denote, respectively, random variables, specific or deterministic values which random variables may assume, and their 
alphabets. For two jointly distributed random variables, X and Y, let Px, Pxy and Px\y respectively denote the marginal 
of X, joint distribution of (X, Y) and conditional distribution of X given Y. X™ is a shorthand for the n — m + 1 tuple 
{X m , X m+ i, ■ ■ ■ , X n -i, X n }. B(X) denotes the Borel cr-algebra of a given topological space, X. V(X) denotes the probability 
simplex on the finite alphabet, X. Cb(X) denotes the set of continuous and bounded functions on the topological space X. 
stands for the indicator function. N and R denote the sets of natural and real numbers respectively. We impose the assumption 
of finiteness of cardinality on all alphabets of operational significance (source, channel input, channel output, reconstruction), 
unless otherwise indicated. The general problem setup, depicted in Fig. [T] consists of the following principle components : 
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Fig. 1. Real time coding with lookahead. Encoder uses future source symbols upto a fixed finite lookahead, d and unit-delay noise free feedback, decoder 
uses present channel output and past memory for source reconstruction. Complete memory case corresponds to Z% = Y 1 which implies \Zi\ = \y\ l ■ 



• Source : Generates i.i.d. source symbols, {J7,-}j 6 n G U. The source symbols are distributed ~ Py. 

• Channel Encoder : The encoder has access to unit-delay noise-free feedback from the channel output and future source 
symbols upto a fixed finite lookahead, d, i.e, Xi — f ey i(U l+d ,Y' 1 ^ 1 ), where f e j is the encoding function, /„ $ : U l+d x 
y*- 1 -> X, i € N. 

• Channel : Given channel input symbol, Xi, and all the source symbols and past channel inputs and outputs, 
(uf, channel output, y$ is distributed i.i.d. ~ Py\X< Le -' 

P{y i \uf,x\y i - x )=P Y \x{Vi\xi). (2) 

• Memory : The decoder cannot make use of all the channel output symbols upto current time due to memory constraints. 
Memory is updated as a function of the past state of the memory and the current channel output, i.e., Zi = f m ^{Zi-\ : Yi), 
where the / m .j is the memory update function, / mj j : Zj_i xJ-> Zj, i€N. Note that the alphabet Z; can grow with 
i, hence the setup also includes the special case of complete memory, i.e., Z% = Y % which implies |Zj| = |3^|*. 

• Channel Decoder : Channel decoder uses the current channel output and the past memory state to construct its estimate 
of the source symbol, i.e., C7j = fd,i(Zi-i, Yi), the decoding rule is the map, fd.i : Z,_i x y — > U. 

The alphabets U, X, y and U are assumed to be finite. Let A(-, •) : U x U — > R indicate a distortion function. We assume 
for simplicity that, < A(-, •) < A max < oo. Let the tuple = (f e , f m , fd) indicate the sequence of encoding rules, 
{/e,i}ieN, memory update rules, {f m ,i}izN and decoding rules, {fd,t}ie~N- 
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Definition 1: [Distortion-Optimal Policy] For a fixed lookahead, d, we define d-distortion optimal policies, V opt {d) as the 
set of (/ e , f m , /^-policies, denoted by n(d), which achieve the minimum expected average distortion, i.e, 



V opt {d) = I (j,(d) : fi(d) = arg inf limsup E 

{fe,fm,fd} N^oc 



1 N 

N ^ 

i=l 



The corresponding minimum expected distortion as a function of lookahead, d, 



D(d) = inf limsup E 

{fe,fm,fd} N~>oo 



1 N 



(3) 



(4) 



Our main goal is to characterize D(d) and identify structural properties of the elements of V opt (d). 

Note 1: Note that inf in the definition of D(d) can equivalently be replaced by min (cf. Appendix [A}. This implies that 
V opt (d) is non-empty. Taking limsup in definition of D(d), while appearing more conservative, is actually inconsequential as 
you would get the same value of D(d) if you put a liminf in the definition. This can be easily argued as follows. Let, the 
per-symbol expected distortion under a policy fi upto time N be denoted by D^. Denoting D snp (d) and D lni (d) as the 
distortion criterion with limsup and liminf respectively, we know D (d) < D sup (d). We will now show D (d) > D sup (d). 
Let a policy fi* attains the infimum for D (d) (that there exists such policy follows from the same arguments as above 
for the non-emptiness of V opt {d)). This implies (as A(-) is bounded) for e > 0, 3 N(e) > such that under this policy 
£)( N (e)) < D ,ni (d) + e. Operating such a policy in b blocks, 



which implies in the limit e -> 0, D sup (d) < D lnt (d) 



D sup (d) < lim ^ (£)6) < D ini (d) + e, 

6— ¥00 ^ 

inf) 



(5) 



III. Controlled Markov Process with Average Cost : Background and Preliminaries 

We present here an overview of parts of the controlled Markov process with average cost criterion framework that will 
be applied. First, we present an overview of the unconstrained case where the only objective is to maximize an expected 
average cost. We then consider the constrained case where, in addition, the system needs to satisfy certain expected average 
cost constraints. 



A. Unconstrained Control 

Here we overview results about general Borel state and action spaces. We refer to [31 1 for a more complete discussion. The 
problem is characterized by the tuple (S, A s , A, W, F, Pg, Pw, g) and a discrete time dynamical system, 

s t = F(s t -i,a t ,w t ), (6) 

where the states s t take values in finite, countable or in general Borel space S (called the state space), actions at take values 
in the admissible action space, A s (s t ) which is a subset of a compact subset A (called the action space) of a Borel space, 
and the disturbance, wt, takes values in a measurable space VV (called the disturbance space). Initial state So is drawn with 
distribution Pg and the disturbance w t is drawn from the distribution, Pw('\ s t-i, a t) which depends on past actions and 
states, only through the pair (s t _i,a t ). We consider only measurable functions. A policy tt is defined to be the sequence of 
functions, tt = (/ii, //2, • • ■)> where /i t is the function which maps histories (cj) t = (sq, wq, ■ ■ ■ , Wt-i)) to actions. A set of history 
deterministic policies, Urd is characterized by policies for which actions are generated as a t — fit(&t)- A set of Markov 
deterministic policies, Hmd is characterized by policies for which actions are generated as a t — /i t (s t _i). A set of policies 
HgD is referred to as stationary deterministic if it is characterized by a function fi : S — >• A such that, /it($t) = M s t-i) ^ 
Policies can be randomized or deterministic ( IT3T1 . Section 2.2). The policy sets Hhr, ^mr an d TIsr respectively stand for 
history randomized, markov randomized and stationary randomized policies. As per our definitions and interests, the largest 
class of policies considered henceforth will be history deterministic policies, TIhd- Let 

K, = {(x,a) :xeS,ae A s {x)} G B{S x A). (7) 

Note if S and A are compact subsets of a Borel space, K, is a compact subset € B{S x A). The dynamics induce a stochastic 
transition kernel on B(S) x JC, Q(-\x, a), which implies for each (x, a) 6 JC, Q(-\x, a) is probability measure on B(S) and for 
each D € B(S), Q(D\-) is Borel measurable on JC. 

The objective is to maximize expected average reward given a bounded one stage reward function, g : K, — » R and find the 
optimal policy. The average reward of a policy tt with a given initial state distribution v is defined by, 



J(i/, 7r) = liminf E^ 

N— too 



1 N 

jy5>(St-i,Mt(*t)) 



(8) 
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The optimal average reward and the optimal policy is defined by, 

J°p\ v ) = sup ,/(*/, tt) (9) 

n opt {v) = {n : J(v,tz) = J opt {v)}. (10) 

Note that in general for a controlled Markov process with average cost criterion, where the state space is infinite, the total 
expected average cost might depend on the initial state. However, operationally, since our objective is to minimize the expected 
average distortion as in Eq. [4] we can decide to start of the system with the best initial state, state which yields the best 
distortion, in which case the optimal cost and optimal policy will be denoted by, J opt and ir opt . 

J opt = sup.J opt (v) (11) 
ir opt = {n : 3 v s.t. J{y,ic) = J opt }. (12) 

We need not dwell on sensitivity of the optimal cost to initial states, as this will not be an issue in our application of this 
framework. However when state space is say finite, irreducible and positive recurrent, average cost is indeed equal for all initial 
states. In general, there can be more than one optimal policy, in which case ties are resolved arbitrarily. 
The following theorem describes the average cost optimality equation (ACOE) for such a process, and relates the optimal 
reward with the optimal stationary deterministic policy. 

Theorem 1 (cf. Theorem 6.1 of 07V): If A G R and a bounded function h : S — > R satisfy, 



A + h(s) = sup 

aeA 



g(s,a)+ / Pw(dw\s, a)h(F(s, a, w)) 



V seS, (13) 



then A = J opt . Further, if there is a function fi : S — > A such that fi(s) attains the supremum above for all states, then 
J(tt) = J opt for 7T = {mi,M2,-- •} with = A*(si-i), Vi. 

Note 2: As in ||3D . the above theorem assumes the conditions of semi-continuous model, ( OTI . Section 2.4). However in 
the set of problems considered in our paper, all such assumptions will be trivially met such as the transition kernel being 
weakly continuous in JC and the continuity of g. For brevity, we omit explicitly mentioning such assumptions before invoking 
the above theorem in the sections to follow. 



B. Constrained Control 

In constrained control, the system is characterized by the tuple (S, A S ,A, W, F, P$, P\y, g, 1> T). With all the terms carrying 
the same meaning as in previous subsection, 1 = {1% (•),••• , h(')} and T — {Vi, ■ ■ ■ ,Tk} are respectively fc-dimensional 
constraint functions (defined on JC) and cost vectors for some k £ N. the dynamics of the system are precisely the same as 
in the unconstrained case, the objective here being, 



maximize J(v, /i) 
subject to Jiiy, fi) < Vi V i = 1, • • • , k, 



where, 



is the average cost and, 



J 0, 7T 



A 



liminf EZ 



1 N 



t=l 



J?(v,n) = liminf E£ 



N- 



1 N 



V i = 1, • • • , k, 



(14) 



(15) 



(16) 



are the constraints. 0T1 and 11321 provide a treatment of this problem but only for denumerable states. We here present the 
more general framework of ||33l , with compact state and action spaces. The Lagrangian, L, associated with the problem is 
defined as, 



L{{v, tt), A) = J(u, + W - Jifa M)), 



(17) 



for any (v, tt) € P (S) x Uhd and A = (Aj., • • • , Afe) G R^. (positive orthant of the fc-dimensional Euclidean space). 

The following theorem gives conditions of optimality of a particular initial state distribution and a policy. 

Theorem 2: [Theorem 2.3 of [33 1] Assume the following conditions for the tuple (S, A s , A, W, F, P$, Pwi ffj 1j T), 
CI S and K, are compact. 

C2 g G C b {K) and h G C b (JC), V i = 1, • • • , k. 
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C3 For all x n — > x and a n — > a, Q(-\x n , a n ), converges weakly to Q(-\x,a). 
C4 (Slater's Condition) There exists a (F,7f) G P (S) x Hhd such that, 

J-(V,W) <Ti Vi = l,'--,k. (18) 

Under the conditions C1-C4, the Lagrangian L(-, •) has a saddle point with a randomized stationary policy, i.e., 3 A* > and 
e P(5) x n Sfl such that, 

L((i/,tt),A*) < L({v*,tt*),\*) < L((i/*,7r*),A), V (j/,7r) e P(5) x n, A > 0, (19) 
which implies (from Theorem 2.1 of |33|) that (v*,ir*) is a constrained optimal pair. Further (Theorem 2.2 of ll33ll ). 

L* = L((v* , (J,*), A*) = inf sup £-((z/, 7r), A) = sup inf L((u, it), A), (20) 

(i>, 7 r)eP(5)xn ffD (i/,7r)6P(S)xn H £) A -° 

and L* is the solution of the problem or the minimum expected average distortion such that the constraints are satisfied. 

Note 3: In all the settings considered henceforth, Vse5, A s (s) = A, hence with benign abuse of notation, we will drop 
A s from the tuple associated with our description. 



IV. Real-Time Coding with Limited Lookahead : Complete Memory 

The problem we described in Section [II] (Fig. [T| is an abstraction of a real time communication problem with the encoder 
having a fixed lookahead of the future source symbols and a perfect unit-delay feedback of the channel output symbols. In this 
section, we show that this problem can be formulated as a controlled Markov chain process with average cost criterion, and 
derive an optimality equation. Before that, we modify our source to concentrate on an equivalent problem. Note that the i.i.d. 
source, S = {£/j}i6N considered can be replaced by a markov source Sm = {^i}ieN such that, Vi = Ul +d € U d+1 . Since 
the source S is i.i.d., the transition kernel for this Markov process Sm from v = (ui, U2, ■■■ , Ud+%) to v = (fii,U2, ■ ■ ■ , Ud+i) 
is given by, 



K(v,v) = P(v\v) = l {(u2 ... !Ud+l)=i u 1} -,u d )}Pu(ud+i)- 



(21) 



The transition matrix is denoted by K. Let us assume the distribution of initial state is Py. Also there is no loss of optimality 
in considering encoding functions to be |V| dimensional mappings, {f e ,i{v, V 1 ^ 1 , y i_1 )} t , e v- The effective problem with 
modified source, Sm is now a real-time communication problem as in Fig. [2] with no lookahead. For this modified problem, 
we seek to minimize the average reward, 



inf lim sup — E 

n— >co tl 



i=l 



inf lim sup — E 

71— >00 tl 



»=i 



(22) 



where A(Vi, V° pt (Y 1 )) = A(Ui, U° pt (Y 1 )). In this section we construct an average cost optimality equation for the equivalent 
problem in Fig. [2] and complete memory, i.e. Z t = Y l . 
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Fig. 2. Equivalent problem to Fig. [T] with memoryless source S = {t/}igN transformed to a Markov source, Sm = {^}ieN- 
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A. Average Cost Optimality Equation 

Definition 2 (Bayes Envelope and Bay es Response): Consider a random variable X taking values in a finite alphabet X 
with distribution ~ Px and x € X is our guess. The loss function A : X x X — » R can be understood as quantifying the 
discrepancy in the actual value of X and its estimate. An estimate is good if its expected loss E[A(X, X)] is small. We define 
the Bayes Envelope as B{Px) — min^ Ep x [A(X, x)]. This represents the minimal expected loss value associated with the best 
guess possible. The best guess is called the Bayes Response to Px and is denoted as X Bayes(Px) — argmin E[A(X, x)], where 

X 

ties are resolved arbitrarily. In the presence of observation, the optimal estimator of X based on Y in the sense of minimizing 
expected loss under A is given by Xsayes{Px\Y) — argmin E[A(AT, x)|Y]. Note that in general, the Bayes response depends 

X 

on the loss function, this dependence is implied whenever we use Bayes response. 
Lemma 3: The optimal decoding rule for the problem in Fig. [2] is given by, 



vr{Y i ) = v Bayes (p VilYi ). 

Proof: Fix n and the encoding rule. From the definition of Bayes response, 



which implies, 



mMY 1 )) 



> 



> 



A(Vj, VBayea{Pvt\y*)) 



HVi,VBayes(P Vi \Yi)) 



Y' 



Thus we have the following lower bound on the expected average cost, 



lim sup — E 



> lim sup — E 



X)A(K,fW.(*V,|y«)) 



(23) 



(24) 



(25) 



(26) 



which is attained by decoding rule V° pt (Y l ) = Ve ayes (Py.|yi). Thus the optimal decoding for the original source, U° p (Y 1 ) = 

U Bayes 

Fix the decoding rule to be the optimal rule {V° pt }ieT$ a s above. Consider the state sequence for this problem, Si — 
(Vi, Py^yi) € S. Py i \Y i denotes the belief of the encoder on the source symbol given all the past and the present channel 
outputs. Let us denote it by a |V|-dimensional non-negative probability (column) vector f3y*- As source symbols takes values 
in a finite alphabet, the state space S is a compact subset of Borel space. Consider the disturbance to be Wi = (V^, Y^), which 



takes values in a finite set, V x y. The action is history dependent, A* = f e ,i(So,V l \Y l 



f e ,i{S Q ,W 1 - 1 ) (here S 



is some fixed initial state). P$ is some initial distribution. From now on we will use fe.iiV 1 1 ,Y l : ) interchangeably with 
/(So, to denote Ai as So is fixed. The action set is the set of mappings from V to X, hence \A\ = \X\ M , which 

is finite. Note, 



P{W t \S l 



KiVi-uVAPiYAXi =Ai{VA) 
PwiWilSi-uAi), 



(27) 

(28) 
(29) 



where (*) follows from the fact that {Vi}igN is Markov, and from the DMC property of the channel as in Eq. pj. Hence 
Wi ~ P w (-\Si-i,Ai) where P w is given by Eqs. ((28) and ((29 1. 



Lemma 4: Given knowledge of the entire past history of actions, states and disturbance, the current state evolves according 
to a deterministic function of the past state, current action and the current disturbance, i.e., 



S i =F{S i - l ,A i ,W i ). 



(30) 



Proof: 



P 



Vt=v\Y* 



v,Yi\Y* 



P 



E 



2~2vi-uVi Pv i ^ 1 y,,Y i \Y t . 1 

Ev,., Pv^Y'-iKjVi-uvWYilXi = Mv)) 
Ev^^Pv^Y^KiVi^VAPiYilXi = Ai(Vi)Y 



(31) 
(32) 
(33) 
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Therefore, 



[Pv i= v\Y'] v£V 

E* & ,ft t - 1 K(v)P(X\A i {*)) 
GC9 y *-i, A h Yi), 



where K(u) = [K(v, t>)]sgv is a column vector. Since Wi = (Vi,Yi), Si — (Vj,, /3 Y i) Eq. (36l implies, 

S i = (V i ,p Yi ) = (YuGifln-^^Yi)) 

= FiS^A^Wi). 



(34) 
(35) 
(36) 

(37) 
(38) 
(39) 



Let, 



g(Si,A i+1 ) 



AiV^VBayeA^W 1 



Therefore, 



inf lim sup — E 



= — sup lim inf — E 

n— >oo ft 



^2g(S t ^ 1 ,A l ) 



(40) 
(41) 
(42) 



(43) 



Hence the tuple T = (S,A,W,F,Ps,Pwid) forms a controlled Markov process. The problem of finding the best channel 
encoder (using the optimal decoder to be the Bayesian Vi(Y l ) = V° pi '(Py.\ Y i)) in our problem of real time communication 
is equivalent to the problem of finding the optimal policy for the tuple T which maximizes the average reward under the cost 
function g. The optimal reward is given by, 



1 



AS?* = sup lim inf — E 



Thus the ACOE for the controlled Markov process T = (<S, A, W, F, Ps,P\v,g) which has the generic form, 



\ + h(s) = 
when specialized to our setting becomes, 



sup 

aeA 



g(s,a)+ P\y(w\s, a)h(F(s, a, w)) 



w£W 



A + h(v, /?) 



sup 

aeA 



g{v,l3,a)+ ^2 p w(w\v,P,a)h{F{v,fi,a,w)) 



w£W 



which becomes, upon substitution from Eq. ([28), 



\ + h(y,p) = c(/?) + sup 

aeA 



Vv e V, (i G P(V), 



VveV, f3e v(V). 



(44) 



(45) 



(46) 



J2 K(v, v)P Ylx (y\a(v))h(v, G((3, a, y)) 
_(v,y)evxy 

We will now transform back the setting from Markov source {Vij-jgN to i-i-d. source {Ui]i^js/. Let us denote v 

(ui,u 2 ,- ■ ■ ,u d+1 ) and /3 = p(u u ■ ■ ■ , u t j + i)(< ll) ... At+1 ) eW d+i. Note that, 

E \A(Vi,V opt (Y i )) 



(47) 



Y 



A(Vi,VBaye S (P Vi \Y<)) 



Y' 



A{U tl U opt (Y 1 )) 



Y' 



A(Ui,U Bayes (P UilY *)) 
= min Pp. i Y i (u)A(u, it) 



Y> 



ueU 



where (*) follows from the Definition [2] Hence, 

g(s,a) = c{j3) = - min Pi(u)A(u, u), 



(48) 
(49) 
(50) 
(51) 

(52) 



ueu 
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where @i{u) — Y^(u 2 ■■■ u d+1 )&i d ^( w > ' ' ' > ^ci+i) is the marginal of /3 on the first component. Note that g(-) is continuous 
in f3. Thus the transformed ACOE to our original problem with i.i.d. source, 



A + h(m, ■ ■ ■ ,u d+ x,P) 
= — min y (3i(u)A(u, u) + max 



u&A 



Y Pu(u)P Y \x(y\a(u2, ■■■ , u d+1 ,u))h(u 2 . ■■■ , u d+1 ,u, G((3, a, y)) 



(53) 



Note 4 (Structure of Optimal Policy): As A is finite, we have replaced the sup with max in the Eq. (53 i. Specializing 
Theorem [T] for the above ACOE, if there exists, a constant, A* and a measurable real valued bounded function /i:S->R such 
that equation Eq. J53} is satisfied for all (m, ■ ■ ■ ,u d+ i) e U d+1 ,(3 e V{U d+1 ) then the minimum distortion D{d) = -X opt = 



—A*. Further, if there exists a function, fx : S A such that the maximum for all states in Eq. (53i is attained by /i(s) = 
/j,(u d+1 , j3), then the optimal encoding policy is stationary and depends on history only through the past state, i.e., = 
nl;Si-i) = £/■/_!, (Syi-i), or the input to the channel at i th time epoch, is X, t = Ai{U d+l ) = fx(Uf +i , Uf_ x , Py*-i) = 
[i(Ui-i, • • • , U d+ i, /?yi-i ). Hence the optimal encoding in this case is a stationary mapping into X which uses only d + 2 
source symbols U d ^ x and the belief /3yi-i that is updated by the Eq. (36 1. 



B. To Look or Not to Lookahead : Optimality of Symbol by Symbol Policies 

In this section we derive conditions for stationary, symbol by symbol policies to be optimal. This means that we seek to 
identify situations where the optimal encoding at time i is given by, X* = fi S ymboi(Ui). 

Lemma 5: When lookahead d — 0, the minimum average distortion is achieved by symbol-by-symbol encoding (and 
decoding) and given by D symbo i = min^^ E A (u, UBayes(Pu\Y)j 

Proof: Consider the communication system in Fig. [T] with lookahead, d = 0. Thus this corresponds to a communication 
system with memoryless source and memoryless channel, and causal encoding and causal decoding with unit delay feedback. 
We will first use standard information theoretic methods to prove, 



D 



symbol 



min E 



A(U,U(Y) 



(54) 



Achievability : 

Let D m i n denote the minimum distortion. Clearly D sym boi is achievable by encoding, X(-) and decoding, [/(•) which attain 
the minimum in Eq. (54 1. Hence D m i n < D sym i, i. 
Converse : 

Consider the chain of inequalities to prove D m i n > D sym b i. Let D be the distortion achieved by any causal encoding and 
causal decoding. Also note that minimizing over functions of the form, f e ^(U l ) and f d .i(Y l ) is equivalent to minimizing over 
vector valued mappings of the form, f e .i(-, U' 1 ^ 1 ) : U — > X and f dy i(-, Y l ~ v ) : y — > U 



1 

D = limsup-VE \k{U u 

I— 1 



Ui 



1 n 

limsup-^E [E \k{U l ,U i )\U 1 - 1 ^Y 1 - 1 

I— 1 
1 71 

limsup TE[E \A(U h Ui)\f e ,i, fd,U V l '\Y l - x 



Note that, 



A(/7i, Ui)\f e ^, f d ,i, U l ~ 1 ,Y l ~ ] 



= Pu(u)P Y \x(y\fe,i(u))Hu,f d ,M) 

ueu.yey 

> min Pu(u)P Ylx (y\X(u))A(u,U(y)) 

x ' u ueu, y ey 

D symbol? 



(55) 
(56) 
(57) 

(58) 

(59) 
(60) 



which implies, D > Dsymboi for all possible achievable distortions, which implies, D m i n > D sym i, i. What is left is to show 
that, 



D 



symbol — mln 1 

X-U^tXfj-.y^U 



A(U,U(Y)) 



min E 

x-u^x 



A (ll, UBayes{Pu\Y , 



(61) 



to 



which is equivalent to showing that for any encoding rule the optimal decoding rule, U is the Bayes response UBayes(Pu\Y) 
which follows from the definition of the Bayes response. ■ 



The above proof shows that if stationary symbol by symbol policy is optimal for controlled Markov process of Section IV-A 
then the optimal reward is given by, 



A 



symbol 



min E Alt/, U B ayes{Pu\Y) 
X:U^X 1 v 



(62) 



Note that the joint distribution of (U,Y) on the right hand side of Eq. ( |62[ > and hence the expected loss is dependent on the 
encoding rule X. To simplify the notation we denote, A(Ui,UBayes(')) by A(t/;, •), the Bayes response is implied in this 
notation. Also, for a given source, Pjj, channel Py\x> an d a symbol by symbol encoding policy fi s , i.e., Xi — [i s {Ui), let 
f fls [Pu, y] denote the posterior, Pu\y, when source is distributed as Pjj, and encoding policy is /i s through the channel Py\x ■ 
Note for brevity we omit indicating Py\x m the argument of fp s (-) though the posterior depends on channel also. Hence if 
/i s is the minimizer in Eq. (62 1, then \ sym boi is given by — E [A (U, f fls [Pu, Y})]. To state our next result, pertaining to the 



optimality of symbol by symbol coding, we introduce another bit of notation. The evolution of the posterior is through the 
function G(f3,a,y), i.e., 



P = G(p,a,y) 

P T K(v)P(y\a(v)) 
J2, eV ^K(v)P(y\a(v)) 



(63) 
(64) 



Also let for a distribution j3, B((3) = min^y u), which is the Bayes Envelope for the given loss function. 

Theorem 6: Denote the encoding function which achieves the minimum in Eq. ( [54| by /i s . For the problem setup depicted 
in Fig. [T] with the ACOE in Eq. (53 i for a given positive lookahead d > 1, stationary symbol by symbol policy is optimal if 
the following holds : 



d+i 



Y, P nx{v\^{u k ))B{f^ k ,y]) 
yey 



E M^\x{y\li.(u))B{f lt .[Po,y])+Y l 

(u,y)£Uxy k=2 

^(«)Py|x(y|a*(«2 +1 ,w))S(/3i) 

(u,y)euxy 

E Pu(u)P Y \x(y\a*(u d 2 +1 ,u))J2 [Y, P Y\x{v\^{u k+l ))B{f^y]) 

(u,y)€Uxy k=2 \yey 

E Pu(u)P Y \x(y\a*(4 +1 ,u)) [E^V|x(y|M s ("))S(/Ma[^+i.y]) 
{u,y)euxy \yey 



(65) 



where a*( ) is the minimizer of the right hand side of the above equation, (3 = G(f3,a,y) and f3k and f3k denote the marginal 
of the k th component of f3 and ft, respectively. 
Proof: 

We will first assume that symbol by symbol policy is optimal or the optimal encoding is Xi = fii(U d+l — 
Hs{U d+l , Si-i) — ix s (Uf +l ) = ji s {Ui). Hence we can solve for an h that satisfies the following equation, 



Kymbol +h(ui,---,Ud+i,P) = - min ^ ft (m) A(«, ii) + ^ Pu{u)PY\x{y\^s{u 2 ))h{u 2 . 

u&J (u,y)£llxy 

V(«i, • • • , u d+1 ) e U d+ \ /3 e T(U d+1 )J = G((3, » s ,y). 



(66) 



We claim that for a given lookahead, d, Eq. (66 1 is satisfied with, 



rf+i 



h(ui, ■■■ , u d+1 ,f3) = -B{Pi) - E 



fc=2 



Y. P Y\x{y\^s{u k ))B(f tls [h,y]) 
yey 



(67) 
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where f3k is the marginal f3k(uk) — Tl( u k - 1 u d+1 ') ' ' ' > u d+i)- To prove the claim, consider the L.H.S. of Eq. 

V 1 ' k-\-l' 

LHS = Xsymbol + h\U\, ■ ■ ■ , Ud+1, p) 



(661, 



d+l 



-E [A (U,U. [P Ul Y])]-B(f3 1 )-J2 



k=2 



Y PY\x(y\^s{u k ))B(f^ 



yey 



Y Pu(u)PY\x(y\^(u))B(f^[Pu,y] 

(u, y )euxy 

d+l 

X p m(ylMsK))£CWM 



-m)-Y 



fc=2 



Before evaluating the right hand side of Eq. (661, we evaluate the marginals (3 k - 

(3 T K(u d 1 + 1 )P Ylx (y\f, s (u d + 1 )) 



Hence the marginals, 



Thus RHS of the Eq. d66 



J2 U ui)Pu(ud+i)PY\x(y\^s(ui +1 )) 

Euf+i £« ^)^d+i)Py|x(yM^ +1 )) 

Efif+i E« Z 3 ^ uf)P U {u d+1 )P Y \x{y\^s{u 1 )) ' 



J] /3(^ +1 ) = UMMui), k = i 

= 0k+i(uk), 2 <k <d, 
= Pu(u d +i), k = d + l. 



(fi,2/)GWx}> 

= -BCSi)- Y Pu{*)Pr\x(v\l*.(u2))B(p 1 ) 
(u,v)eUxy 

d 

- J2 Pu(u)P Y{x (yW(u2))J2 

(u,y)eUxy k=2 

- y Pu(u)p Y \x(y\vs(u2)Y p Y\x(y\vs(u)B(u a [Pd+i,y]) 

{u,y)euxy yey 

= -B(fr)- Y Pu(u)P Y \x(y\^(u 2 ))B(U B [f32,y}) 
(u,y)euxy 

d 

~Y E ^(«)^y|x(y|M s ("2))^Py|x(y|^K+i))S(/M s [/3 fc+ 
fc = 2 \_(u,y)<£Uxy yey 

- Y M^PYixiyl^sMYPYixiyl^Bif^Pu^}) 

(u, y )euxy yey 

d 

= -Bfa) - ]T PY\x(y\^(u 2 ))B(f^[/3 2 ,y]) - Y 

yey k=2 

(u.y)euxy 
= LHS. 



Y p Y\x(y\» s (u k+ i))B(f,M,y])) 

yey 



Y p Y\x(y\» s (u k+ i))B(U a [p k +i,y]) 

yey 



(68) 
(69) 



(70) 

(71) 
(72) 

(73) 

(74) 

(75) 

(76) 
(77) 

(78) 



(79) 



(80) 



(81) 
(82) 
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Thus Eq. (65 1 and Eq. (53 1 imply that 3 a bounded function, h given by Eq. (67i such that we have 



A 



symbol 



h{u\, ■ 



, Ud+1, 



V(ui 



max 



E p u(u)P Y \x(y\a(ui +1 ,u))h(u 2 . ■ ■ ■ ,u d+1 ,u, 

l(u,y)£Uxy 

,u d+1 ) eU d+1 , peV(U d+1 )J = G(p,Vs,y), 



(83) 



and the maximization is attained by the policy, /j, s (-, s) — H s (-) such that /i s (iti, ■ ■ ■ , Ud+i) = fJ, s (ui). This implies by Theorem 
[T|that the symbol by symbol coding policy is optimal. 

■ 

Corollary 7: Given X ^>U, uncoded symbol by symbol policy, i.e., fi s (U~i) — /i c (tA) — Ui is optimal if, 

d+i 



J2 P Y\x(y\u k )B(f^[(3 k ,y}) 



E Pu{u)P Y \x{y\u)B{f^[P U: y]) + Y, 

(u,y)£Uxy k=2 
(u,y)&Axy 

E Pu(u)P Ylx (y\a*(u d 2 +\u))J2 (j2 P Y\x(y\u k+ i))B(f^ k: y}) 

(u,y)eUxy k=2 \yey 

E Pu(u)P Y \x(y\a*(u d 2 + \u)) [Y, p Y\x(y\u)B(UAPd + i,y}) 
(u,y)euxy \yey 



V(ui 



(84) 



where a*(-) is the minimizer of the right hand side of the above equation, j3 = G(f3,a,y) and j3 k and (3 k denote the marginal 
of the k th component of j3 and j3, respectively. 

Proof: Substitute in Theorem [6] fi s = fi c . ■ 

V. Real-Time Coding with Limited Lookahead : Finite Memory 
A. Average Cost Optimality Equation 



In Section IV we considered the scenario where the decoder has access to the entire past channel output sequence or, 
equivalently memory is unbounded. In this section, we will develop controlled Markov process formulation for the case where 
the memory alphabet is finite and does not grow with time i.e, the memory space (M) is time-independent with \M\ < oo. 
We make two assumptions on our coding systems for this setting : 

Al There is a fixed time-independent memory update function, i.e., there exists a function f m such that Zi = f m (Zi-i,Yi) 
for all i. This assumption is not very restrictive as real systems such as quantizers or finite window storage devices store 
only the past few channel output symbols and evolve in a time invariant way, eg. Zi = f m (Zi-i,Yi) = Yi, implies the 
reconstruction is given by, Ui — fd,i{Yi, ^i-i). 

A2 We fix the optimal decoding rule to be, fd,i(Yi, = U opt (Yi, Zi-x), that is the decoding is restricted to optimal 

policies among the stationary (time invariant) ones. Note that though we assume stationary decoding, optimal encoding 
may in general not be stationary. 

Hence the optimal expected average distortion depends on d,f m ,A4 and hence we denote it by, D(d, f m , M.) to distinguish 



it from D{d) of Section IV 



Here too we begin by formulating a controlled Markov process for the modified source, {Vij-jgN an d men substitute 
for the original source. By the assumption A2, the optimal decoding is stationary, V opt (-). Consider the state sequence, 
Si = (Vi,Zi) £ S(= V x Z) and the disturbance sequence, Wi = (Vi,Yi) E W(= V x y). The actions can be history 
dependent, Aj = f e ^(So, W 1-1 ), Sq is a fixed initial state (with some distribution P$). The disturbance depends on the past 
sequence of disturbances, states, actions and the current action only through the past state and current action, i.e., 



PiWilW*- 1 , S l ~\ A 1 ) = PiVuY^V*- 1 ^- 1 ,^) 

= KiVi-u^PnxiYilMVi)) 
= PwiWilSi-uAi), 



(85) 
(86) 
(87) 



and hence 



5, = {VuZi) = {Vi,S m {Xi,Zi-{)) = F(Si-i,Ai,Wi). 



(88) 
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Note that for our transformation (as in Section |IV| i the modified cost function is given by, 

A(Vi, V opt {Y u Zi-i)) = A(U l: U opt (Yi, Zi-t)). 



Therefore consider, 



Thus, 



E [ A( Vi , V opt (Y i ,Z i _ 1 ))\V i ~ 1 ,Y i ~ 1 ,Z i ~ 1 ,A i ~ 
£ P(Vi =v,Yi = y\V*-\Y*-\Z*~\A*)k{v, V° pt (y, Z^)) 

K(V^ 1 ,v)P(y\A l (v))A(v, V° pt {y, Z^)) 

vev,yey 

-g(Si-i,Ai). 



i " 

- VE \k(y u V° pt (Y^Z^)) 

i=l 
1 " 

n E E [ E X^V° vt ^^-x))\V l -\Y l -\Z l ~\A l 

i=\ 

n 

n — ' 



i=l 



and consequently, 



inf lim sup — E 



J2Hv t ,v opt (Y 1 )) 



= — sup lim inf — E 

n— >oo ft 



(89) 

(90) 
(91) 

(92) 
(93) 

(94) 
(95) 
(96) 

(97) 



Thus the problem is to find the optimal policy for the controlled markov process, T = (<S, A, W, F, Ps, P\v> <?)> which 
maximizes the average reward under the cost function g. The optimal reward is given by, 



Xj? 1 = sup lim inf — E 



22g(Si-i,Ai) 



Thus the ACOE for the controlled Markov process T — (S, A, W, F, Ps, Pw,g), is given by, 



A + h(s) 



A + h(v, z) 



sup 

aeA 



sup 

aeA 



](s,a)+ Pw('w\s,a)h(F(s,a,'w)) 



w£W 



Vs e S 



g(v,z,a)+ ^2 P\v(w\v,z,a)h(F(v,z,a,w)) 



Vv e V, z e Z. 



(98) 



(99) 



(100) 



We will now transform back the setting from Markov source {I^I^n to i.i.d. source \Ui\ie_u- L et us denote v — 

{ Ul ,u 2 ,---,u d+1 ). Since A(Vi, V opt (Yi, ^<-i)) = A(f/j, U opt (Yi, Zj-i)) , we have, 

g(s,a) = g(u d+1 ,z,a) 

= - J2 Pu(u)P Ylx (y\a(u 2 ,---,u d+ll u))A(u 2l U° pt (y,z)). (101) 
ueu.yey 

Thus the resulting ACOE for our problem of an i.i.d. source with lookahead d (replacing again sup by max due to finiteness 
of the action set), 



= max 

aeA 



A + ft(ux, ■ ■ • ,u d+1 , z) 

^ Pu(u)PY\x(y\a{u 2 , ■ ■ ■ ,u d+1 ,u)) {h(u 2 . ■ ■ ■ ,u d+ i,% f m (y, z)) - A{u 2 , U opt (y, z))\ 



V(u 1 ,---,u d+1 )eU d+1 , zeZ. 



(102) 



Here again, invoking Theorem [T| implies that if the ACOE in Eq. ( 102 1 is solved by a real A* and a bounded h(-), then 
D(d,f m ,M) = -X*. 
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Note 5: The reason for making assumptions Al and A2, is that the modification of the cost function A(-, •) results in g(-, •) 
which is time invariant and a state sequence evolving through a function F(-,-,-) which is also time invariant. 



Theorem 8 (Optimality of Stationary Policy): The ACOE Eq. ( 102 1 admits a stationary optimal policy. 

Proof: This follows from Theorem 4.3 in BTl as for a fixed lookahead, d, the state space and action space are finite. 
Hence the optimal encoding is, Xj = fi(U^{, Zi-i). ■ 

B. Computing D(d, f m ,M.), Bounding D(d) 



In this section, we explicitly compute D(d, f m ,A4). Note that in the setting of complete memory in Section IV the average 
cost optimality equation can also be solved approximately. As the state space is compact, this admits discretization of the space 
and running value or policy iteration to obtain approximations to the optimal distortion. References [34] and [35] provide an 
extensive treatment along with prescriptions of error bounds and the trade off between quantization resolution and the precision 
of the approximated optimal reward for discounted cost problems. However, the computational point of view we take in this 
section does not follow the path of discretization and then approximation of the average reward. Rather, we compute exactly 
D(d,f m) \M.\) which provide non-trivial upper bounds on D(d). This is illustrated by the following example. 
We assume U = X = y = U = {0, 1}. The source is Bern(p),p £ [0, 0.5] and the channel is BSC(S), S £ [0, 0.5] and loss 
function is hamming distortion. The memory is of m bits and retains the last m channel outputs and hence, Ui = U opt (F/_ m ). 
We will denote the optimal expected average distortion by D(d,m) in this case. We observe the following, 

. D(0, m) = D(0) = D symbo i = min^^ E A(£7, U Bayes {Pu\Y) = min{p, 5}, V to. 
. D(d) < D(d,mi) < D(d,m 2 ) < D(0) V mi > m 2 . 
. D{d) < D(di,m) < D(d 2 ,m) < D(0) V di > d 2 . 
. D(oo) < D(d) < D(d, to) < D(0) V to, d. 

Lemma 9: D(oo) = D m i n where D m i n is the minimum achievable distortion of the joint source-channel communication 
problem. 

Proof: Any sequential or limited delay encoding and decoding scheme can obviously be embedded into and emulated 
arbitrarily closely by a sequence of block codes. Hence D m i n < D(oo). We will now prove D m i n > D(oo). This is equivalent 
to proving that any sequential scheme with infinite lookahead can be used to construct a block coding scheme which can 
then attain arbitrarily close to D m in hence, D(oo) < D m i n . The argument is based on block Markov coding, (cf. Section on 
Coherent Multihop Lower Bound, Chapter 17, lf36l ) except that instead of coding via looking at the past block as in block 
Markov coding, here we look at the future block. Fix an arbitrarily small e > 0, an arbitrarily large B, and an n sufficiently 
large that there exists a block coding scheme of blocklength n achieving per symbol distortion no larger than D m i n + e. Let f e 
and fd denote the encoding and decoding mappings of that e-achieving scheme. We now construct a sequential scheme with 
lookahead d = 2n as follows : 

• Encoding : We code in the present block using the source symbols of the future block, i.e, X n (b) = f e (U^\ ]n ) for 
6 = 1, •••,£> — 1, and some dummy coding for the last block known to the decoder. 

• Decoding : For the block one decoder has some predefined construction. For block b, decoder constructs source symbols 
as, U n (b) — fd(Y^-2)n+i)' b = 2; ' ' ' i B, thus decoding is in blocks using the past block. 

The per-symbol distortion achieved by this scheme is clearly upper bounded by A """+( B ^)(- D "^+ e ) ^ w hich can be made 
arbitrarily close to D m [ D for sufficiently small e and sufficiently large B. ■ 
We have run relative value iteration to compute D(d,m) for d = 1 and some values of to, yielding some interesting upper 
bounds on D(l). Note that the values obtained are exact and do not approximate distortion as the relative value iteration 
converges in a few iterations. This is because the state space and action space is finite, and it is easy to check that the weak 
accessibility condition (Definition 4.2.2 Il37l0 is satisfied. This implies by Proposition 4.3.1 of ll37l . that relative value iteration 
converges. Fig. [3] shows the distortion values as a function of source distribution when the cross over probability is fixed, 
5 = 0.3. Fig. [4] shows the distortion values as a function of channel cross over probabilities when the source distribution is 
fixed, p = 0.3. 

These plots provide insight into the structure of optimal policies in the setting of Section [IV] given that we are considering 
Bern(p) source, BSC(5) channel under Hamming loss and \X\ = \U\ = 2. Since D(l,2) is an upper bound on -D(l) and 
hence on D(d),d > 1, it is clear that for source distributions and channel cross over probabilities where D(l,2) < D(Q), 
symbol by symbol is not optimal. We evaluate this region and show it in Fig. [5] Note when d = oo, since separation is optimal, 
the region of suboptimality of symbol by symbol is the the complete square (p £ (0, 0.5) , <5 € (0,0.5)) except the boundary 
where symbol by symbol is optimal. Also note, as is consistent with the plots, that in the zero lookahead case, we have 
D(0) = min{p, 5}. Hence, for any lookahead, d, for a fixed cross over probability 6, if symbol by symbol encoding-decoding 
achieves D(d) for p = po < 0.5, then it is also optimal for p £ (po, 1 — po]. Similarly, for a fixed source probability p, if 
symbol by symbol encoding-decoding is optimal for 5 = S < 0.5, then it is also optimal for S £ (Sq, 1 — <5 ]. 
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d=1 , m=0 

o d=o 




Fig. 3. Computing and contrasting D(d,m) and D(d) for a Bernoulli source and binary symmetric channel and Hamming loss. We fix channel cross over 
probability S = 0.3 and vary source probability in [0, 0.5]. For d = 1, we have plotted values for increasing memory, m = 0, 1, 2, which yield series of 
non-trivial non-increasing upper bounds on D(l). D(0) is achieved by symbol symbol by symbol coding, while D(oo) = D m i n of Shannon's joint source 
channel coding (achieved by separation). 



0.35 




Fig. 4. Computing and contrasting D(d, m) and D(d) for a Bernoulli source and binary symmetric channel and Hamming loss. We fix source probability 
p = 0.3 and vary the channel cross-over probability in [0, 0.5]. For d = 1, we have plotted values for increasing memory, m = 0, 1, 2, which yield series of 
non-trivial non-increasing upper bounds on D(l). D(0) is achieved by symbol symbol by symbol coding, while D(oo) = D m i n of Shannon's joint source 
channel coding (achieved by separation). 
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Fig. 5. The plot shows a region in the source-channel plane where symbol by symbol coding is suboptimal among schemes with lookahead d = 1, i.e., it 
does not achieve D(l). This is the shaded region which is bounded inside by the two curves. 



VI. Real Time Coding with Limited Lookahead : In the Absence of Feedback 

In the previous sections we assumed the availability of perfect unit delay feedback from the decoder to the encoder. We 
now consider the same setting as that depicted in Fig. [T[ but without feedback, and we formulate the problem as a controlled 
Markov process. Decoders have finite state (see Note |6jl and the assumptions Al and A2 are presumed for similar reasons as 
in Note [5] in Section (V) i.e., memory is finite and decoding is stationary. Here again, we first study the system with modified 
source, {Vi}i^. The state space for this problem is, Si — (Vi, Pz^v 1 ) € S and the disturbance Wi = V,. The actions are 
thus history dependent, A4 = f e ^(So,V 1 ^ 1 ) — fe,i{Sa,W % ~ 1 ), Sq is some fixed initial state with distribution P$. Due to 
Markovity of the source we have, 

P(W,\W l -\S l -\A l ) = K{Vi- U Vi) (103) 
= PwiWilSi-uAi). (104) 

Denoting {P Zl \vi}z^z by A, 

Pi[z) = (105) 



Z t _ 1 ,V l ,z\V'- 1 



Ez^ Pz^\v^ KiYi-i, V l )P(Y l \A l {V l ))l {z = fm(Y ^_ l)} 

Ez 1 _ 1 ,y i ,z j ^- 1 |v«-^(V i _ 1) V 4 )^il^(^))l{^=/»(y*,Zi-i)} 
Ez < _ 1 ,y i A-i(^-i)^(V;-i,^)TOIA(V;))l{^j m (y i ,z i _ 1 )} 

E, WiU ; M (^i)^(i',-i,F,)P(y i |A,(F 1 ))i {z , =W y i!Zw)) 



This implies, 



(106) 
(107) 
(108) 



p i = Z((3 i - 1 ,V i ,V i - 1 ,A i ) = Z(S i - 1 ,A i ,W i ), S i = F(Si-i,A i ,W i ). (109) 
We now have for average cost, with modified cost function A(Vi, V) = A(£/j, Ui) and stationary decoding V opt (-), 

e [KfYuV^fx^z^iv*- 1 ,^ 1 ,^] (no) 

]T P(Z i _ 1 =z,V i =v,Y i = y\V i -\p i -\A i )A(v,V^ t (y,z))) (111) 

ft-i(z)A-(^_ 1 ,f,)P(y|A l (5))A(«,f°P t (y,z))) (112) 

zez,vev,jjey 

= -glYi-ufr-uAi) (113) 

= -g^Si-i.Ai). (114) 



Thus. 



- V" E \A(Vi, V opt (Yi, Zi-t)) 
n L — ' L 

i=l 

1 - 

- E E [ E [Hv^v^iY^Zi^))^- 1 ,^- 1 ,^ 

i=l 

1 n 

--^[giSi-uA)]. 
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(115) 
(116) 
(117) 



We finally write down the ACOE after transforming to the original source (similar to the previous sections), 



g(u( +1 ,/3,a)+ ^2 Pu{u)P Y \x(y\a(u2,- ■ ■ ,u d+1 ,u))h(u 2 -- ■ ■ ,u d+1 ,u,/3) 
(u,v)euxy 

V( Ul ,---,u d+1 ) eU d+1 , $eV(Z). (118) 



A + hiux, ■ ■ ■ , Ud+1,/3) = max 



where 



and 



g(s,a) = 5 K +1 ,/3,a) 



P(z)Pu(u)P(y\a(u 2 , 

z£Z,u£U,yey 



,u d+l ,u))A{u 2 ,U^\y,~z)), 



/3 



• • • ,u d +i,/3),a, (u 2 , ■ ■ ■ ,u d+ i,u)) 
j Yiez.yey P{z)P{u)P{y\a{u 2 , • • • , u d+u u))l{z=f m ( y ,z)} 



f3(z)P(u)P(y\a(u 2 , • • • , u d+1 , u))l{ 



z=f m (y,z)} 



(119) 

(120) 
(121) 



z£Z 



Theorem [T] implies that if ACOE Eq. ( |1 18 1 is solved by a real A* and a bounded h(-), then D(d, f m , M) = —A*. The results 
on the structure of optimal policies parallel those outline in Note [4] and hence are omitted. 

Note 6: For the setting considered in this section when no feedback is present, we have restricted our attention to finite 
state decoders only, unlike the previous section where feedback was present and we also considered the case where decoding 
used complete memory. This is because in the absence of feedback when decoding uses complete memory, the state space is 
one on the simplex of distributions on alphabets that grows exponentially with the time index and hence the results of the 



theory presented in Section III are not as directly applicable. 



VII. Sequential Source Coding with A Side Information "Vending Machine" 

In previous sections we considered the problem of real time source-channel communication when the encoder generates 
channel input symbol sequentially with a lookahead, with or without unit delay noise-free feedback, and the decoder generates 
the estimate of the source given the channel output and the memory. In this section we consider a rate-distortion problem, where 
encoding is sequential with lookahead. In addition to it, the decoder can take cost constrained actions, also in a sequential 
fashion, which affect the quality of the side information correlated with the source symbol it attempts to reconstruct. We 
consider two classes of such models : one where the encoder has access to the past side information symbols through unit 



delay noise-free feedback (Section VII-Ai and the other when it does not (Section VII-Bi. The findings of this section are 
similar in spirit to those of previous sections and assert the universality of the methodology invoked in the paper. We defer 
the proofs in this section to the Appendix. 



A. Encoder has access to Side Information 

The setting depicted in Fig. [6] consists of the following blocks : 

• Source Encoder : The encoder has access to source symbols upto a lookahead, d and to the past side information symbols, 
i.e, X t = / e ,i(f7 i+d ,F i ~ 1 ), where f e>i is the encoding function, f e>i : W +d x y 1 - 1 -> X, i € N. 

• Memory X : The decoder might not be able to use all of the encoded symbols upto current time due to memory 
constraints. Memory X is updated as a function of the past state of the memory and the current encoder output, i.e., 
Mi = f mi i(Mi-t,Xi), where the f mi i is the memory update function, / TO) j : M.i-i x X — > A4i, i G N. Note that the 
alphabet A4i can grow with i, hence this includes the special case of complete memory, i.e., Mi — X 1 . 
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SIDE INFORMATION 
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Fig. 6. The setting of sequential source coding with source lookahead at the encoder and side information vending machine at the decoder. The encoder 
also knows the past side information symbols through a unit delay noise-free feedback from the decoder. 



Actuator : Actuator uses the past Memory X, and the current encoded symbol to generate an action, i.e., A Vi . 
f Vt i(Mi-i, Xi), where f Vt { : Aii-i x Xi — > A v . The action sequence should satisfy the following cost constraint, 



lira sup E 



1 - 

i=l 



< r, (122) 



where C(-) is the cost function and T is the cost constraint. 

• Side Information "Vending Machine" : The side information is generated according to Py\u,A v > i- e -> 

Piyiluf^x^ai) = P Y \u,A v {yi\ u i> a v,i)- ( 123 > 

• Memory Y : The decoder may be limited in its ability to remember all the side information upto current time due to 
memory constraints. Memory Y is updated as a function of the past state of the memory and the current side information, 
i.e., Ni = f n i(Ni—x,Yi), where the f ni is the memory update function, f ni : A/i_i x X —> A/j, i 6 N. Here also the 
alphabet J\f% can grow with i, hence also includes the special case of complete memory, i.e., N{ = Y l . 

• Source Decoder : Source decoder uses the current encoded symbol, current side information and the past memory states, 
to construct its estimate of the source symbol, i.e., £/j = fd.i(Xi, Yi, Mj_x, iVj_i), the decoding rule is the map, fgj : 
X x y x A4i-i x Afi-i — > U. The complete memory case corresponds to the decoding, Ui(X l ,Y l ). 

The alphabets U,X,A v ,y,Ai,Af are assumed to be finite. Note that the finiteness of the alphabets implies we may assume, 
without loss of generality, that < A(-) < A max < oo and < C(-) < T max < oo. We make the further assumption that 
there exists a € A v such that C(a) = 0. Thus it makes sense to consider cost constraints, V € [0, T max \. 

Our approach to construction of the ACOE is similar to that taken in previous sections, we first consider the system 
with modified source, {Vi — L^ +d }i e N an d it is equivalent to consider source and action encoding rules as mappings, 
{/e,i(",^ !_1 ,y i_1 )}uev an d {fv,i{x, X l ~ 1 )} xe x- Hence the modified vending machine is, 

P(Y\V t ,A VA (X z )) = P(Yi\Ui, A v , i (X i )), (124) 

and the modified cost function is, 

A{Vi,Vi) =A(Ui,Ui)- (125) 

We study two scenarios under this setting : 

1) Complete Memory: Here Mj = X 1 and Ni = Y l . Note that we can restrict our attention to optimal decoders of the 
form, Ui(X l ,Y l ) = UsayesiX 1 ,Y l ) (cf. Lemma Bj). Let us denote the minimum expected average distortion achieved to be 
D^ B (d). Here FB superscript indicating we have side information available as a feedback to the encoder, a subscript denotes 
presence of actions and d stands for lookahead. We have the average cost optimality equation as, 
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max 

(a e ,a„) 



g x (u 1 , ■ ■ ■ ,u d+1 ,P,a e ,a v ) + P(u)l^ =ae ^... iVd+lj z)}P(y\u2,a v (x))h(u d , ■ ■ ■ , u d+1 , u, (3) 

aeu ,£ex ,yey 

VK-, u d+1 ) G U d+ \ p G V(U d+1 ), (126) 



where (a e ,a v ) £ A e x A v and /3 = G(ft, a e , a v , (u 2 ,u,x,y)) (cf. Appendix |B| is the updated belief and g (•) is the 
Lagrangian augmented cost, 

g x {ui, ■ ■ ■ ,u d+ \,P,a e ,a v ) 



g(ui, ■ ■■ , Ud + i,/3, a e , a v ) + A (r - l(m, ■ ■ ■ ,u d+1 ,/3, a e ,a v )) 

/ 3 i(«) A (« 5 '") + A ( r - X! - P (") 1 {£=a c ( i x 2 ,-,« d+1 ,n)}C(A t ,(i)) 



(127) 
(128) 



ueu 



ueu,xex 



We have now the following theorem, with proof in Appendix [B] 

Theorem 10: For a fixed lookahead, d, let (p A (-), /i A (-)) solves the ACOE Eq. (126 1. Then the optimal average distortion 
is given by, 



Da(d) = -ini sup p A (x). 

x >° x£U d + 1 xV(U d + 1 ) 



(129) 



Note 7: Note that for a fixed finite lookahead, d > 1, D^ B (d) contrasts the minimum distortion at, d — 0, where symbol 
by symbol encoding, action-encoding and decoding are optimal, i.e, 



min E 

X:U-¥X,A:X-¥A 



A 



( {/, UBayes(Pu\X,Y) 



(130) 



while at infinite lookahead, d = oo, the minimum distortion is given by the distortion rate function at unit rate by results from 
1221 . i.e, 



such that 



min E 

I{U-W,A V ) <\og 2 \X\ 
|W| < |W||A| + 2 

E[C(A,)] <r, 



A 



(V,^ opt (w,y) 



(131) 



where /(•; •) is the mutual information (cf. |38|). The above distortion is basically the distortion rate function (cf. Theorem 3 
E2l ) evaluated at the rate equal to the cardinality of the alphabet \X\. The proofs for Equations (130i and (131 1 are similar 
to those of Lemma [5] and Lemma [9] 

2) Finite Memory: In this section, all memories are finite (not growing with time). With the object of minimizing the expected 
distortion, we cast this problem as a constrained Markov decision process. To be able to do that, for reasons discussed in 
Section [V] we assume, / m .j = f m , f n< i — /„, Aii = A4 and J\f% = M for all i 6 N, the alphabets Ai, J\f being finite. We 
further assume stationary optimal decoding and actuator policies, i.e., f dl i(-, ■, ■, •) = U opt (-, ■, ■, •), and f a i(-> ■) = ■) 
for all i G N. 

Fix a lookahead d. Now for fixed A > 0, the average cost optimality equation is, 



p X {u 1 , ■ ■ ■ ,u d+1 ,m,n) + h x (ui 7 ■ ■ ■ ,u d+1 ,m,n) 



max 

aeA 



g X (ui, ■ ■ ■ ,u d+1 ,m,n,a) + ^ P(u)l {£=a{u2 ^...^ d+1 ^)}P(y\u 2 , A° p \x))h(u d , ■ ■ ■ ,u d+1 ,u,m,h)) 

ueU,xex,yey 

V (u u ■ ■ ■ , u d +i) G U d+1 ,m £M,neAf, (132) 
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where rh — f m (m,x) and h = f n (n,y) are memory updates and g x (-) is the Lagrangian augmented cost, 

g x (ui, ■ ■ ■ ,u d+ i,m,n, a) 

= g(ui, ■ ■ ■ ,u d+1 ,m,n, a) + X (T ~ l(u 1: ■ ■ ■ ,u d+1 ,m,n, a)) (133) 

= - ^2 P (") 1 {£=Q(«2r-^«d + i,«)} P (yl u 2,A7*( 5 )) A ( u 2,^Bay e;i (S,y,TO,"-)) 

ueu.xex.yey 

+ A T- P{u)l {i ^{u 2 ,...,u d+l ,u) } C{AT{i)) ■ (134) 

\ u&A,xeX I 

Let us denote the optimal distortion by D FB (d, M.,M). We have now the followi ng th eorem, with proof in Appendix |c| 

Theorem 11: For a fixed lookahead, d, let (p x (■) , h x (■)) solves the ACOE Eq. (132i. Then the optimal average distortion 
is given by, 

D FB (d,M,N) = - inf sup p x (x). (135) 

X -° x£U d+1 xMxAf 



B. Encoder does not have access to Side Information 

Here encoder does not recieve any knowledge about side information. In this section also, we make assumptions Al and 
A2 and further assume finite state decoders (for reasons similar to those outlined in Note [6). For a fixed lookahead d, A > 0, 
we have the average cost optimality equation, 



p A (ui, • • ■ ,u d+ x,j3,i) + h x (ux, ■ ■ ■,u d+1 ,(3,j) 



max 



(til, • • • , u d+1 ,f3, 7, a) + ^2 p (u)l{ £=a ( U2t ...^ d+1 ^)}P(y\u2,A° pt (x))h(u d , ■■■ , u d+1 ,u, (3, -y)) 
ueu,xex,yey 

V (u 1 ,---,u d+1 ) eU d+ \/3eV{M),-/^VW, (136) 

where /3 = £ m (/3, uf +1 , u, a) and j = £„(7, u^ +1 , u, a) are belief updates (cf. Appendix |d|) and g x (-) is the Lagrangian 
augmented cost, 

g x (u 1 , ■ ■ ■ ,u d+1 ,p,j,a) 

= g(ui, ■ ■ ■ ,u d+ i,/3,7, a) + X (T - ■ ■ ■ ,u d+ i, /3,j,a)) (137) 

= - X] P(mh(h)P{u)l {i=a ^ U2 ^...^ d+uii}} P(y\u2,A° p \ 

rheM,h£Af ,u£U ,xex ,yey 

+ Air- J2 P (u)Mz=a e (u 2 ,...,u d+1 fi)}C(A^x)) J . (138) 

Let us denote the optimal distortion by D^ F (d,A4,Af) (NF standing for no feedback of side information symbols). We can 
now state the following theorem whose proof is defered to Appendix [D] 



Theorem 12: For a fixed lookahead, d, suppose that (p x (■) , h x (■)) solves the ACOE Eq. (136i. Then the optimal average 
distortion is given by, 

D* F {d,M,Af) = - inf sup p x (x). (139) 

x£U d + 1 xV(M)xV(N') 

Note 8: Note that the ACOE in this section on sequential sour ce coding with lookahead and a side information vending 
machine is amenable to computational solutions as in Section V-B Here also D FB (d, M. , AT) can be computed for increasing 
memories exactly, and yield non trivial bounds on D FB (d). 

VIII. Summary of the Results 

In this section we provide a summary of the various settings considered in this paper on real time communication with 
fixed finite lookahead at the encoder, and the transformations performed to cast the problem as (constrained or unconstrained) 
Markov decision process. The methodology is to construct an average cost optimality equation (ACOE), and seek its solution. 
We have considered two classes of problems in this paper : 
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1) Real Time Communication, Fig.^ The problem is characterized by tuple (S, A, W, F, Ps,P\v, 9), the meaning of various 
symbols being explained in Section [III] The general ACOE is, 



Ms) + h(s) = max 

a£A 



g(s,a)+ 2_. Pw(w\s,a)h(F(s,a,w)) 



V s G S. 



(140) 



Note in all the settings we considered, sup is replaced by max as the set of actions is finite. If 3 A* E R and a bounded 
h(-), satisfying the above equation, then using Theorem[T] the minimum distortion is —A*. The following table exhibits 
the transformations, along with pointers to the equations in the paper that cast the problem of Fig. [T] as an unconstrained 
Markov decision process : 



Real-Time Communi- 
cation, Fig. [T] Looka- 
head, d 


Noise-Free Feedback, 
Complete Memory 
Decoding 


Noise-free Feedback, Fi- 
nite Memory (M.) De- 
coder 


No Feedback, Finite 
Memory (A4) Decoder 


S, state space 


U d+1 x V(U d+1 ) 


U d+1 x M 


U d+i xV(M) 


A, action space 


Mappings : U d+1 -> X 


Mappings : U d+1 -> X 


Mappings : U d+1 -> X 


W, disturbance 


U d+1 x y 


u d+1 x y 


U d+1 


F{.) 


Eq. ( 


39 




Eq. ( 


88 1 




Eq. ( 


109 




Pw(-\S,A) 


Eq. ( 


29 




Eq. ( 


87 




Eq. ( 


104 




g(S,A), reward 


Eq. ( 


52 




Eq. ( 


101 




Eq. ( 


119 




ACOE 


Eq. ( 


53 




Eq. ( 


102 


i 


Eq. ( 


118 





2) Source Coding with a Side Information Vending Machine, Fig. [6] : The problem is characterized by tuple 
(S, A, W, F, P S ,P W , g, I, T) explained in Section |III-A| Here also the general ACOE is, 



/(s) + h*(s) = sup 

aeA 



K {s,a)+ ^2 P(w\s,a)h(F(s,a,w)) 



V ses, 



(141) 



where A is the Lagrangian parameter. Minimum distortion is given by the Theorems 10 11 and 12 respectively, for the 
cases tabulated below. 



Source Coding With 
SI "Vendor", Fig. [6] 
Lookahead, d 


Noise-Free Feedback, 
Complete Memory 
Decoding 


Noise-free Feedback, Fi- 
nite Memory (M , Af) 
Decoder 


No Feedback, Finite 
Memory (M , AO 
Decoder 


S, state space 


U d+1 x T(U d+1 ) 


U d+1 xMxAf 


U d+1 x V{M) x T(Af) 


A, action space 


Mappings : U d+1 xX — > 
X x A v 


Mappings : U d+1 -> X 


Mappings : U d+1 -> X 


W, disturbance 


u d+1 x x x y 


u d+1 x x x y 


U d+L 


F(.) 


Eq. ( 


166 


i (Appendix 


B 




Eq. ( 


181 


i (Appendix 


C 




Eq. ( 


199 


i (Appendix 


D 




P W (-\S,A) 


Eq. ( 


162 


i (Appendix 


B 




Eq. ( 


180 


i (Appendix 


C 




Eq. ( 


194 


i (Appendix 


D 




g x (S,A), Lagrangian 
augmented reward 


Eq. ( 


173 


i (Appendix 


B 




Eq. ( 


190 


i (Appendix 


C 




Eq. ( 


208 


i (Appendix 


D 




ACOE 


Eq. < 126 1 


Eq. (132i 


Eq. (136i 



IX. Conclusion 

In this paper, we consider an important class of problems in real time coding : a memoryless source is to be communicated 
over a memoryless channel, with sequential encoding and decoding and with a fixed finite lookahead of future symbols available 
at the encoder. Unit delay feedback may or may not be present, and decoding is based on the channel output symbols without 
delay, with or without a memory constraint. In all these scenarios, under the objective of minimizing the per-symbol distortion, 
we obtain average cost optimality equations whose solution yields the minimum achievable distortion, as well as sufficient 
conditions for the optimality of stationary policies. We contrast the minimum distortion at a fixed lookahead, with the best 
achievable with zero lookahead, where symbol by symbol encoding-decoding is optimal, and with the infinite lookahead case, 
for which the minimum achievable per symbol distortion is shown to coincide with that for the classical joint source channel 
coding problem, where separation is optimal. For the Bernoulli source and binary symmetric channel under Hamming loss, in 
case of finite state decoders, we compute exactly the minimum distortion values for various memory sizes, and study the upper 
bounds that they yield on the minimum distortion for a fixed lookahead in the absence of memory constraints. Answering 
the question "to look or not to lookahead", we characterize general conditions on the source and channel such that symbol 
by symbol encoding-decoding is optimal within the class of schemes of a given lookahead. We obtain and plot the region 
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for source and channel parameters in case of Bernoulli source, binary symmetric channel and Hamming distortion, where 
the symbol by symbol policy is strictly suboptimal. We then demonstrate that this framework of casting real time coding 
problems as Markov decision problems with average cost criteria can be useful in various other frameworks by applying this 
same methodology in source coding problem with a side information vending machine, where encoder encodes the source 
sequentially, with a possible lookahead, decoder takes cost constrained actions to receive the side information about the source. 
This setting is cast as a constrained Markov decision problem and it is shown that a stationary randomized policy can attain 
the minimum per-symbol distortion which is characterized as the solution to a saddle point equation. 
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Appendix A 

In Section |II] the minimum expected distortion is defined as, 



D(d) 



inf lim sup E 

{feJmJd} N^-OC 



1 N 



(142) 



Note that inf in the above definition can be replaced by min over the class of (/ e , f m , /d)-policies as outlined in Section [TT] 
This is argued by constructing an (/ e , f mi fd) -policy that achieves D(d). Fix lookahead d. As D(d) always exists (also it is 
finite due to our assumption that A(-,-) < A max < oo), for a positive non-increasing vanishing sequence {e m } m >i, we can 
construct a sequence of policies, i.e., {fJ. m } m >i with expected average distortion D^ m (d), i.e., 



lim sup E, 



limsupD^ m (d) 



(143) 
(144) 



N- 



(E Mm is the expectation with respect to the joint probability distribution induced when the policy used is jjL m ), such that D^ m (d) 
is a monotone non-increasing sequence converging to D(d). By the definition of lim sup, for every m > 1, 3 N m (e m ) such 
that V N > N m (N m being function of e m is implied henceforth), 



D»Jd) <D tlm (d) + e m . 
Now define a block-length sequence {l m E N} m >i, satisfying the following requirements, 



(145) 



£7 : L 



R2 : l m > max{A m , A m+ i} and that 



— > as m — > oo. 

max{jV m ,jV m+ i} 



as to 



Note that we can always choose such a sequence, for eg. l m = max{A m , A r m+ i}(^™^ 1 li). We define a block-coding scheme 
H* which operates with block length li in i th block with scheme /ij. Operating this scheme for time N £ (El™=i ^jES* ^»] 
for some to = m(N), (note m(N) —> oo as A — > oo) we can bound the normalized distortion as: 

. (Case 1)N- YZi l i < N m+i 



D^(d) < (^P 1 ) Amax + l ~§ (D, m (d) + e m ) +( N ~ ^ h ^jA max (146) 
N j/ Tn ) Amax + ~ (D, m (d) + e m ) (147) 
< ( E ^ xl ^ +jVro+1 ) A mna , + £> Mro (d)+e ro , (148) 

where (a) is due to the fact that Z m > 7V m (requirement R2) and hence distortion in block m is bounded above by 
D Hm{d) + e m . 

(Case 2)N- J2ti h > N m+i 



D»,(d) < \^^\A max + ^(D„ m (d)+e m )+^ (£> Mm+1 (rf)+e m+1 ) (149) 

g ^^^1^ + ^^^+^+^^^^^^^ + ^) (150) 

A max + £) Mm (d)-|-e m (151) 



£1^ 



< f ^ iVm+1 ) A m ax + (d) + e w , (152) 

where (b) follows from bounding the distortion in block to as in (a) and similarly as N — 2~27Li U — ^m+i> bounding 
distortion in block m + 1 by 1 (d) + e m +i and (c) follows from the fact that both D^ m (d) and e m are non-increasing 
sequences. 
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Thus we see in both the above cases, for any time N, the normalized distortion is bounded above as, 



D»(d) < 



EI=i h + N m+ i 

l m 



which implies that the expected average distortion under policy /i* is, 



Df jL *{d) < limsup 



< lim sup 

N—KX) 

® D(d), 



in 



(153) 

(154) 

(155) 
(156) 

where (d) follows from the fact that f 1 — - — > and " +1 — > by requirements /?i and /?2 respectively, since m(N) — > oo 
as — > oo. Thus we have a scheme /x* with minimum expected distortion D p * (d) < D(d), but we know for any scheme /j*, 
D{d)<D ll ,{d), implying D^d) = D{d). 



Em— 1 j 
i=l l i 



h-max + lim Sup 
iV-voo 



^ 77 ). 



lim sup (D^ m (d) + e m ) 



N- 



Appendix B 
Proof of Theorem [Tol 



We will first obtain the ACOE Eq. (126i. Define the state sequence, Si — (Vi,{Pv i = v \x i ,Y i }v^v)< disturbance sequence, 
Wi = (Vi,Xi,Yi), action sequence, Ai = {f e ,i(v, V 1-1 , Y 1 ' 1 ), f v .i(x, X l ~ 1 )} ve v^ xe x is clearly a history dependent action, 
i.e. function of W % ~ x = (V l ~ 1 , X* -1 , Y l ~ 1 ). We will now verify the conditions for the defined state sequence, disturbance 
and action sequence to form a controlled markov process. With some abuse of notation, we denote, 



An 



{-Pv-=u|x»,y*}i;ev 

fv,i{-,X l ). 



Now, 



= x(K_ 1 ,y i )i {x . =Aejm)} p(y i |y i ,^, i (x i )) 

= PwiWilS^Ai) 

f Ey < _ 1 ft-i(^-i)g(V , i-i.«)l { x < =^, t (t,)}J > (yil".A,,i(^i)) \ 
j Ey,.!,^ A-i(^-i) W-i- ^)l{^=A , lW )} P (^l^' ^(^)) 

= G(f3i-i, A e ,j, A Vt i,Vi, Xi,Yi) 

= G(ft_ 1( Ai,Wi), 



which implies, 



The optimal decoding is Vs a iye S (Pvi|x\y)- Also let 00%, = _E A(Fj, VBotfesC^Vil^.y*)) 



(157) 
(158) 
(159) 

(160) 
(161) 
(162) 

(163) 

(164) 
(165) 

(166) 



Y' 



so that we have 



inf lim sup E 

N—foo 



1 N 

-Y,m,v Bayes (x\Y i ) 



»=i 



sup lim inf E 



1 ^ 



Also for the cost constraint on action, 



C(A v 4Xi)) 



V i ~ 1 ,X i - 1 ,Y i - 1 ,A i 



^2 K{V i -i,v)l{ S:= A e , i {v)}C(A Vti {x)) 
vev,iex 
l(Si-i, Ai), 



which imply, 



lim sup E 

AT-»oo 



1 N 



i=i 



lim sup E 

N-too 



1 - 



(167) 

(168) 
(169) 

(170) 
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Thus the problem of minimizing the average distortion subject to constraints on the vending action is equivalent to a constrained 
Markov decision process, (S, A, W, F, Ps, Pwi 9: h T) (note here the number of constraints is k = 1). Fix a lookahead d. Let 
fti denote the marginal of belief ft with respect to the first argument. Now for fixed A > 0, we have the average cost optimality 
equation as, 

p A (wi, • • ■ ,u d +i,ft) + h x (ui,- ■ ■ ,u d +\,ft) 



max 

(a e ,a v ) 



g x (u 1 , ■ ■ ■ ,u d+1 ,ft,a e ,a v ) + P(u)l{ £=ae(U2) ... ) „ d+1)S )}P(y|ti 2 , a v (x))h(ud, ■ ■ ■ ,Ud+i,u,ft) 

ueu.xex.yey 

V («!,••• )Ud+1 ) eU d+1 ,ft eV(U d+1 ), (171) 

where ft = G(ft, a e ,a v , (u d+1 ,u, x, y)) is the updated belief, (a e , a v ) G A e x A v and g x (-) is the Lagrangian augmented cost, 

g x (ui, ■ ■ ■ ,u d+ll P,a e ,a v ) 

= g(ui, - ■ ■ ,u d+ i,P,a e ,a v ) + \ (T -Ifa, - ■ ■ ,u d +i,P, a e ,a v )) (172) 

= - fflin ^ P\ (u)A(% u) + A IT- P(n)l{ £=Qe ( l(2) ... iU(J+lifi )}C , (a 1J (x)) J . (173) 



Now having obtained the ACOE, the proof is an application of Theorem [2] stated in Section III-A We need merely verify that 
the conditions : 

CI holds as the state space and actions space both are compact subsets of Borel spaces. 

C2 holds because of our definitions of g(-), l(-) and assumptions on cost and distortion constraints. 

C3 Denoting the state by s = (ui, ■ ■ ■ , u d +i, /3) and action a = (a ei a v ), we have the stochastic kernel, 

Q(s\s,a) = P ( fi ) 1 {i=a c («2r^« d +i,«)} P (yl U 2,a 1) (s))l {( 3 =G(/3!ai( ^+i^ i £^ )} 

if s = (u 2 , ■ ■ ■ ,Ud+i,u,/3), (174) 
= otherwise . (175) 

Fix tuple a) which takes values in a finite set. Consider a sequence f3 n — > j3. Let (i n and fj, be the measure on B(S) 

induced by Q(-\u d+1 , j3 n , a) and Q(-\u d+1 , ft, a) respectively. Proving C3 is equivalent to proving that V h € Cb(S), we 
have n n {h) — > ft(h), i.e. 

^2 P(u)l{£= ae ( U2 ,---, Ud+1 ,u)}P(y\u 2 ,a v (x))h(F(P n , a, (u d+1 ,u, x, y)) 
ueu,xex, y ey 

-> ^2 P(u)i{x=a c (u 2 ,---,u d+u u)}P(y\u2,a v (x))h(F(ft,a, (u d+1 ,u, x,y)), (176) 
u&a ,-Jtex ,yey 



which is true as F(-) (by its definition Eq. (166l) is continuous in its arguments. 
C4 (Slater's Condition) We need to show there exists a policy such that the constraint on the vending action are strictly 
satisfied, but this is trivially true as we can select a policy with A v ^(-) such that C(A V i) = 0, V i which satisfies the 
slater's condition. Thus C1-C4 being true, this implies that the optimal distortion, D^ B (d) is, 

D^ B (d) = -p* ( =' - sup inf L((u,n),X) (177) 

= -inf sup L((v,n),X) (178) 

(i/,7r)eP(S)xn HD 

= - inf sup p x (x), (179) 

A >° xeW+^Miu^ 1 ) 

where (a) follows from the definition of p* , (b) follows from Theorem [2] (note assumptions C1-C4 are satisfied here as 
proved above) while (c) follows as (p x (-), h x (-)) solve the ACOE. 

Appendix C 
Proof of Theorem[TT1 

Define th e stat e sequence as, Sj = (Vi, Mi, Ni) and the disturbance sequence, Wi = (Vi,Xi,Yi). We will first derive the 
ACOE Eq. (132 1. The action (encoder's control) sequence is history dependent, A{ = / e ,i(-, Tt^ 4 " 1 ). (Note here A{ is the 
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encoding action while A opt is the action taken by the decoder to observe side information). It can be easily established as in 
previous sections that, 



P{Wi\W l -\S l -\A l ) = P w (W i \S i - 1 ,A i ) = K(V i - U Vi)l {Xi=Ai{Vi)} P(Y i \V i ,A'«> t (X i )). 

We have, 

X t = Ai{Vi), Mi = f m (Mi_i,Xi), Ni = /nONi-i, Yi), and hence $ = F(Si-\,A h Wi). 
By the assumptions in the Section |VII-A2| the decoding is stationary, hence we have, 

E [A (Vi , V opt )\V i ~ 1 ,M i - 1 ,N i - 1 ,Y i - 1 ,A i 

if(^-i^)i { ^ l ( 5 )}^(y|«,^7*(i))A(^^ opt (i,y,A^i,iV l -i)) 

= -g(Si-i,Ai). 
For the cost constraints we have, 

E [C(A opt (Vi ) | V*~ 1 , M l " 1 , N*~ 1 , Y <_ 1 , A 1 ] 
= K(V*-uv)l {i=Mi)} C{Al P \i)) 

v£V,x£X 

= KSi-^Ai). 

Fix a lookahead d. Now for fixed A > 0, the average cost optimality equation is, 

p X (u 1 , ■ ■ ■ ,u d+1 ,m,n) + h x (ui 7 ■ ■ ■ ,u d +i,m,n) 



(180) 

(181) 

(182) 
(183) 

(184) 



(185) 
(186) 

(187) 



max 



g x (u!, ■ ■ ■ , u d+1 ,m, n,a)+ P(u)l {£=a{u2 ....^ d+1 ^)}P(y\u 2 , A° pt (i))h(u d , ■■■ , u d+1 ,u, m, n)) 

aeu ,xex ,yey 

V (ui, ■ ■ ■ , u d +i) e U d+1 ,m £M,neAf, (188) 
where rh — f m (m,x) and h — f n {n,y) are memory updates and g x (-) is the Lagrangian augmented cost, 



g x (u u ■ ■ ■ ,u d+ i,m,n,a) 
= g(u\, ■ ■ ■ ,u d +i,m,n,a) + X(T- l(m, ■ ■ ■ , u d+ \, m, n, a)) 

= ~ ^(w) 1 {£=a(« 2 ,--- : « d+1 ,u)}^(y|w2,^7*( i )) A ( u 2,^BQy e;i (i,y,m,n)) 
uGU,xGX,yey 

+ AIT- P{u)l { x=a s( u 2 ,...,u d+l ,u } C(Al P \x)) 

\ ueu,xex 

Once we have the ACOE, rest of the proof is similar to the proof of Theorem 10 by invoking Theorem [2] 



(189) 



(190) 



Appendix D 
Proof of Theorem[T"21 



The proofs of this section follow in line with the previous sections. We just need to establish the ACOE Eq. (136 1, rest of 
the proof follows invoking Theorem [2] Define the following : 



Si — (Vi, PMi\v*i Pn^v*) 
Wi = V 



(191) 
(192) 
(193) 



Let us use the following notation, = P(Mi\V l ) and 7, = P(Ni\V l ). It is easy to see (along the lines of analysis in previous 
sections), 



P(W i \W l -\S*- 1 ,A*)=P(W i \S i - 1 ,A i ) = K(V i - 1 ,V i ), 



(194) 
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and, 



Pi 



7. 



{Em,.!,^,^ ft-i(Mi_i)iif (Vi-i, V r i )l{x i =A i (v i )} 1 {M i =/ m (A: i ,M i _ 1 )} J TOg _ M 

C TO (A-i,^i-i,^,^i) 

f E^_ 1 ,x 4 ,r,A-iW-i)^(^-i s ^0l{x 4 ^(v'O} p ( y il^.^(^i)) 1 {n=/„(yi,w 4 -i)} 



(195) 
(196) 
(197) 



= en(7<-i»^-i,V;,^), (198) 

which imply that $ = F(Si-i,A h W t ). (199) 

Also for constraints, 

E \^K{V i ,V^\X i ,Y i ,M i _ 1 ,N i _ 1 ))\V i - 1 ^-\ 1 i -\Y i - 1 ,A i '\ (200) 

ft_i(m)7i-i(»i)ir(VU,^ (201) 

meM,heAf ,aev .xex ,yey 

= -.9(^-1, A-1,7*-!,^) (202) 

= -giS^Ai), (203) 



and 



KM-uVn^^wyCiAFW) 

l(Si-i,Ai). 



(204) 
(205) 



After these transformations for a fixed lookahead d, A > 0, we have the average cost optimality equation, 

p x (u 1 , ■ ■ ■ ,u d+ i,/3,7) + ft, A (ui, • • • ,-u d+ i,/3,7) 



max 



i(u 2 ,---,u d+1 ,u)}P(y\u 2 ,A° pt (x))h(u d , ■■■ , u d+1 ,u, f3, 7) 

(206) 



g x (ux, ■ ■ ■ ,u d+1 ,f3,j,a) + ^ P(u)l{ x =a( 

ueu,xex,yey 

V (u 1 ,---,u d+1 )eU d+1 ,f3eV(M), 1 ^V(Af), 
where /3 = £ m (/3, u, a) and 7 = ^(7, u, a) are belief updates and g x (-) is the Lagrangian augmented cost, 

g x (u 1 , ■ ■ ■ ,u d +i, (3, 7, a) 

= g{u\, ■ ■ ■ ,Ud+i,/?,7, a) + A (r - ■ ■ ■ ,u d+1 , (3,j,a)) (207) 
= - J! P{m)l{n) p {u)l {i = a{u2 ,..., Ud+1 .u)}P{y\u 2 ,A^ 



M r £ Pi 

w )l{i=a e (« 2 ,— 



(208) 



thus the ACOE Eq. (136i is established. 



